Including dynamic and phonetic information in voice conversion systems
نویسندگان
چکیده
Voice Conversion (VC) systems modify a speaker voice (source speaker) to be perceived as if another speaker (target speaker) had uttered it. Previous published VC approaches using Gaussian Mixture Models [1] performs the conversion in a frame-by-frame basis using only spectral information. In this paper, two new approaches are studied in order to extend the GMM-based VC systems. First, dynamic information is used to build the speaker acoustic model. So, the transformation is carried out according to sequences of frames. Then, phonetic information is introduced in the training of the VC system. Objective and perceptual results compare the performance of the proposed systems.
منابع مشابه
Using Context-based Statistical Models to Promote the Quality of Voice Conversion Systems
This article aims to examine methods of optimizing GMM-based voice conversion systems performance in which GMM method is introduced as the basic method for improvement of voice conversion systems performance. In the current methods, due to using a single conversion function to convert all speech units and subsequent spectral smoothing arising from statistical averaging, we will observe quality ...
متن کاملA phonetic assessment of cross-language voice conversion
Cross-language voice conversion maps the speech of speaker S1 in language L1 to the voice of speaker S2 using knowledge only of how S2 speaks a different language L2. This mapping is usually performed using speech material from S1 and S2 that has been deemed “equivalent” in either acoustic or phonetic terms. This study investigates the issue of equivalence in more detail, and contrasts the perf...
متن کاملA phonetic alternative to cross-language voice conversion in a text-dependent context: evaluation of speaker identity
Spoken language conversion (SLC) aims to generate utterances in the voice of a speaker but in a language unknown to them, using speech synthesis systems and speech processing techniques. Previous approaches to SLC have been based on cross-language voice conversion (VC), which has underlying assumptions that ignore phonetic and phonological differences between languages, leading to a reduction i...
متن کاملPhoneme-Discriminative Features for Dysarthric Speech Conversion
We present in this paper a Voice Conversion (VC) method for a person with dysarthria resulting from athetoid cerebral palsy. VC is being widely researched in the field of speech processing because of increased interest in using such processing in applications such as personalized Text-To-Speech systems. A Gaussian Mixture Model (GMM)-based VC method has been widely researched and Partial Least ...
متن کاملGenerative Acoustic-Phonemic-Speaker Model Based on Three-Way Restricted Boltzmann Machine
In this paper, we argue the way of modeling speech signals based on three-way restricted Boltzmann machine (3WRBM) for separating phonetic-related information and speaker-related information from an observed signal automatically. The proposed model is an energy-based probabilistic model that includes three-way potentials of three variables: acoustic features, latent phonetic features, and speak...
متن کامل